Search CORE

iSeeRNA: identification of long intergenic non-coding RNA transcripts from transcriptome sequencing data

Author: A Siepel
BE Suzek
CCLC Chang
E Byvatov
Hao Sun
Huating Wang
J Liu
Kun Sun
M Clamp
ME Dinger
MF Lin
P Carninci
P Kapranov
P Kapranov
Peiyong Jiang
RT Arrial
SF Altschul
TR Mercer
Xiaofeng Song
Xiaona Chen
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Support vector machine model for diagnosis of lymph node metastasis in gastric cancer with multidetector computed tomography: a preliminary study

Author: AY Kim
CC Chang
CY Chen
CY Wu
E Byvatov
EH Bollschweiler
Farid E Ahmed
FL Greene
HJ Lee
J Mourão-Miranda
J Nasu
Japanese Gastric Cancer Association
JL Patel
K Das
KA McQuisten
Kun Cao
L Shen
Lei Tang
M Pirooznia
Mehtap Tunaci
RE Dorfman
RM Kwee
S Klöppel
S Kumano
SJ Deutch
T Fukuya
XF Zhang
Xiao-Peng Zhang
Y Fang
Ying-Shi Sun
Yun Gao
Zhi-Long Wang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Lymph node metastasis (LNM) of gastric cancer is an important prognostic factor regarding long-term survival. But several imaging techniques which are commonly used in stomach cannot satisfactorily assess the gastric cancer lymph node status. They can not achieve both high sensitivity and specificity. As a kind of machine-learning methods, Support Vector Machine has the potential to solve this complex issue. Methods The institutional review board approved this retrospective study. 175 consecutive patients with gastric cancer who underwent MDCT before surgery were included. We evaluated the tumor and lymph node indicators on CT images including serosal invasion, tumor classification, tumor maximum diameter, number of lymph nodes, maximum lymph node size and lymph nodes station, which reflected the biological behavior of gastric cancer. Univariate analysis was used to analyze the relationship between the six image indicators with LNM. A SVM model was built with these indicators above as input index. The output index was that lymph node metastasis of the patient was positive or negative. It was confirmed by the surgery and histopathology. A standard machine-learning technique called k-fold cross-validation (5-fold in our study) was used to train and test SVM models. We evaluated the diagnostic capability of the SVM models in lymph node metastasis with the receiver operating characteristic (ROC) curves. And the radiologist classified the lymph node metastasis of patients by using maximum lymph node size on CT images as criterion. We compared the areas under ROC curves (AUC) of the radiologist and SVM models. Results In 175 cases, the cases of lymph node metastasis were 134 and 41 cases were not. The six image indicators all had statistically significant differences between the LNM negative and positive groups. The means of the sensitivity, specificity and AUC of SVM models with 5-fold cross-validation were 88.5%, 78.5% and 0.876, respectively. While the diagnostic power of the radiologist classifying lymph node metastasis by maximum lymph node size were only 63.4%, 75.6% and 0.757. Each SVM model of the 5-fold cross-validation performed significantly better than the radiologist. Conclusions Based on biological behavior information of gastric cancer on MDCT images, SVM model can help diagnose the lymph node metastasis preoperatively.</p

Support vector machine versus logistic regression modeling for prediction of hospital mortality in critically ill patients with haematological malignancies

Author: A Chu
A Vinayagam
B Boser
B Giraldo
B Schölkopf
C Cortes
D Benoit
D Hand
D Hosmer
DD Benoit
DD Benoit
E Byvatov
ER DeLong
F De Turck
I Guyon
J Decruyenaere
JE Zimmerman
JS Groeger
L Ohno-Machado
M Soares
P Depuydt
S Lemeshow
S Van Looy
S Van Looy
S Van Looy
S Vansteelandt
SL Zeger
T Verplancke
WS Noble
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background: Several models for mortality prediction have been constructed for critically ill patients with haematological malignancies in recent years. These models have proven to be equally or more accurate in predicting hospital mortality in patients with haematological malignancies than ICU severity of illness scores such as the APACHE II or SAPS II [1]. The objective of this study is to compare the accuracy of predicting hospital mortality in patients with haematological malignancies admitted to the ICU between models based on multiple logistic regression (MLR) and support vector machine (SVM) based models. Methods: 352 patients with haematological malignancies admitted to the ICU between 1997 and 2006 for a life-threatening complication were included. 252 patient records were used for training of the models and 100 were used for validation. In a first model 12 input variables were included for comparison between MLR and SVM. In a second more complex model 17 input variables were used. MLR and SVM analysis were performed independently from each other. Discrimination was evaluated using the area under the receiver operating characteristic (ROC) curves (+/- SE). Results: The area under ROC curve for the MLR and SVM in the validation data set were 0.768 (+/- 0.04) vs. 0.802 (+/- 0.04) in the first model (p = 0.19) and 0.781 (+/- 0.05) vs. 0.808 (+/- 0.04) in the second more complex model (p = 0.44). SVM needed only 4 variables to make its prediction in both models, whereas MLR needed 7 and 8 variables in the first and second model respectively. Conclusion: The discriminative power of both the MLR and SVM models was good. No statistically significant differences were found in discriminative power between MLR and SVM for prediction of hospital mortality in critically ill patients with haematological malignancies

Ghent University Academic Bibliography

Prediction of potential drug targets based on simple sequence properties

Author: AL Hopkins
AP Russ
BP Zambrowicz
BS Khakh
C Cortes
CC Chang
CH Ding
CZ Cai
DS Wishart
E Byvatov
HC Huang
HH Luu
I Dubchak
J An
J Drews
J Luo
J Taipale
JP Overington
K Bedard
KH Krause
KR Muller
Luhua Lai
LW Hardy
LY Han
ME van Gijn
P Imming
PD Dobson
PJ Hajduk
PJ Hajduk
Qingliang Li
RD Finn
RT Moon
S Mullner
SP Butcher
UA Betz
X Chen
Y Katoh
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background During the past decades, research and development in drug discovery have attracted much attention and efforts. However, only 324 drug targets are known for clinical drugs up to now. Identifying potential drug targets is the first step in the process of modern drug discovery for developing novel therapeutic agents. Therefore, the identification and validation of new and effective drug targets are of great value for drug discovery in both academia and pharmaceutical industry. If a protein can be predicted in advance for its potential application as a drug target, the drug discovery process targeting this protein will be greatly speeded up. In the current study, based on the properties of known drug targets, we have developed a sequence-based drug target prediction method for fast identification of novel drug targets. Results Based on simple physicochemical properties extracted from protein sequences of known drug targets, several support vector machine models have been constructed in this study. The best model can distinguish currently known drug targets from non drug targets at an accuracy of 84%. Using this model, potential protein drug targets of human origin from Swiss-Prot were predicted, some of which have already attracted much attention as potential drug targets in pharmaceutical research. Conclusion We have developed a drug target prediction method based solely on protein sequence information without the knowledge of family/domain annotation, or the protein 3D structure. This method can be applied in novel drug target identification and validation, as well as genome scale drug target predictions.</p

In silico approach to screen compounds active against parasitic nematodes of major socio-economic importance

Author: A Harder
A Harder
A Tropsha
AJ Bokisch
AM Mayer
AM Mayer
AM Mayer
C Cortes
CE James
CY Liew
D Dutta
D Wishart
D Woods
DF Cully
E Byvatov
E Lacey
E Marchiori
GW Bemis
H Geppert
IA Sutherland
J Keiser
J Keiser
J Overington
L Holden-Dye
L Holden-Dye
MK Warmuth
MWB Trotter
O Ivanciuc
P Kohler
PA Friedman
R Burbidge
R Kaminsky
RF Freitas
RI Jennrich
RJ Martin
RN Jorissen
S Geerts
S Ranganathan
S Reddy
S-H Xiao
Shoba Ranganathan
Sr Sousa
Varun Khanna
VV Zernov
W Duch
Y Hu
Y Marrero-Ponce
Y Wang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Infections due to parasitic nematodes are common causes of morbidity and fatality around the world especially in developing nations. At present however, there are only three major classes of drugs for treating human nematode infections. Additionally the scientific knowledge on the mechanism of action and the reason for the resistance to these drugs is poorly understood. Commercial incentives to design drugs that are endemic to developing countries are limited therefore, virtual screening in academic settings can play a vital role is discovering novel drugs useful against neglected diseases. In this study we propose to build robust machine learning model to classify and screen compounds active against parasitic nematodes.A set of compounds active against parasitic nematodes were collated from various literature sources including PubChem while the inactive set was derived from DrugBank database. The support vector machine (SVM) algorithm was used for model development, and stratified ten-fold cross validation was used to evaluate the performance of each classifier. The best results were obtained using the radial basis function kernel. The SVM method achieved an accuracy of 81.79% on an independent test set. Using the model developed above, we were able to indentify novel compounds with potential anthelmintic activity.In this study, we successfully present the SVM approach for predicting compounds active against parasitic nematodes which suggests the effectiveness of computational approaches for antiparasitic drug discovery. Although, the accuracy obtained is lower than the previously reported in a similar study but we believe that our model is more robust because we intentionally employed stringent criteria to select inactive dataset thus making it difficult for the model to classify compounds. The method presents an alternative approach to the existing traditional methods and may be useful for predicting hitherto novel anthelmintic compounds.12 page(s

Macquarie University ResearchOnline

ScholarBank@NUS

Korarchaeota Diversity, Biogeography, and Abundance in Yellowstone and Great Basin Hot Springs and Ecological Niche Modeling Based on Machine Learning

Author: A Schramm
A Teske
AE Smith
AE Smith
AL Reysenbach
Amanda J. Williams
Austin I. McDonald
AV Palumbo
B Fry
B McCune
BP Hedlund
Brian P. Hedlund
C Chang
C Cortes
C Ross
C Takacs-Vesbach
CA Lozupone
CA Lozupone
CC Farrar
Christian A. Ross
CJ Ehrhardt
DB Johnson
DJ Lane
DK Nordstrom
DR Meyer-Dombard
E Byvatov
EL Shock
ES Boyd
Everett L. Shock
H Hirayama
Hilairy E. Hartnett
J Felsenstein
J Mathur
J Shen
JA Dyer
JB Navarro
Jeff R. Havig
Jeremy A. Dodsworth
JG Elkins
JI Hedges
JL Gardy
JR Spear
JT Staley
K Takai
K Takai
KC Costa
KE Ashelford
KE Ashelford
KL Rogers
LGM Baas-Becking
LJ Reigstad
MA Goni
Melanie R. Mormile
MT Madigan
O Nercessian
P Rice
R de Wit
RE Zehner
RJ Whitaker
Robin L. Miller-Coleman
S Burggraf
S Burggraf
S Nakagawa
S Skirnisdottir
SM Barns
SY Rha
T Huber
T Nakagawa
TA Auchtung
TA Auchtung
TD Brock
TD Brock
TJ Vick
VT Marteinsson
W Inskeep
W Ludwig
Z Šidàk
Publication venue: Public Library of Science
Publication date: 04/05/2012
Field of study

Over 100 hot spring sediment samples were collected from 28 sites in 12 areas/regions, while recording as many coincident geochemical properties as feasible (>60 analytes). PCR was used to screen samples for Korarchaeota 16S rRNA genes. Over 500 Korarchaeota 16S rRNA genes were screened by RFLP analysis and 90 were sequenced, resulting in identification of novel Korarchaeota phylotypes and exclusive geographical variants. Korarchaeota diversity was low, as in other terrestrial geothermal systems, suggesting a marine origin for Korarchaeota with subsequent niche-invasion into terrestrial systems. Korarchaeota endemism is consistent with endemism of other terrestrial thermophiles and supports the existence of dispersal barriers. Korarchaeota were found predominantly in >55°C springs at pH 4.7–8.5 at concentrations up to 6.6×106 16S rRNA gene copies g−1 wet sediment. In Yellowstone National Park (YNP), Korarchaeota were most abundant in springs with a pH range of 5.7 to 7.0. High sulfate concentrations suggest these fluids are influenced by contributions from hydrothermal vapors that may be neutralized to some extent by mixing with water from deep geothermal sources or meteoric water. In the Great Basin (GB), Korarchaeota were most abundant at spring sources of pH<7.2 with high particulate C content and high alkalinity, which are likely to be buffered by the carbonic acid system. It is therefore likely that at least two different geological mechanisms in YNP and GB springs create the neutral to mildly acidic pH that is optimal for Korarchaeota. A classification support vector machine (C-SVM) trained on single analytes, two analyte combinations, or vectors from non-metric multidimensional scaling models was able to predict springs as Korarchaeota-optimal or sub-optimal habitats with accuracies up to 95%. To our knowledge, this is the most extensive analysis of the geochemical habitat of any high-level microbial taxon and the first application of a C-SVM to microbial ecology

Public Library of Science (PLOS)